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Abstract 

Consider the problem of source coding in networks with multiple receiving termi- 
nals, each having access to some kind of side information. In this case, standard coding 
techniques are either prohibitively complex to decode, or require network- source cod- 
ing separation, resulting in sub-optimal transmission rates. To alleviate this problem, 
we offer a joint network-source coding scheme based on matrix sparsification at the 
code design phase, which allows the terminals to use an efficient decoding procedure 
(syndrome decoding using LDPC), despite the network coding throughout the net- 
work. Via a novel relation between matrix sparsification and rate-distortion theory, we 
give lower and upper bounds on the best achievable sparsification performance. These 
bounds allow us to analyze our scheme, and, in particular, show that in the limit where 
all receivers have comparable side information (in terms of conditional entropy), or, 
equivalently, have weak side information, a vanishing density can be achieved. As a 
result, efficient decoding is possible at all terminals simultaneously. Simulation results 
motivate the use of this scheme at non-limiting rates as well. 
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1 Introduction 

In this work, we consider the problem of efficient distributed source coding for large networks 
with multiple receiving terminals. Assume that information source X is available at some 
source node 5 in a network with noiseless links. The source is to be distributed (with 
arbitrarily small error probability) to several receiving nodes in the network, T = {ti, . . . t^}, 
where each node t G T has some side information available. We assume that for each t, 
(X, y^) have some known joint distribution. An example is given in Figure [l] 

This problem arises in a multitude of networking applications, such as sensor networks, 
peer-to-peer, and content distribution networks. A few examples to be kept in mind can be 
a sensor in a network distributing a temperature measurement while each receiving node has 
its own altitude reading (which is highly correlated) or a video file streamed distributively in 
a network where receivers may have previous/noisy /low-resolution versions of that stream. 

The problem of lossless coding with side information has been studied extensively. We 
give here only a brief overview of the closely related works. In [Ij , Slepian and Wolf considered 
the problem of separately encoding two correlated sources and joint decoding (Figure 
The asymmetric case where side information Y is available at the decoder is a special case 
In [2j, Ho et al. considered the multicast problem with correlated sources 
(Figure |3(a)D , and completely characterized the rate region for this problem: the set of 
required link capacities to support the multicast. In a way, [2] can be viewed as extending 
the Slepian Wolf problem to arbitrary networks through network coding. Further extensions 
also appeared in [3j and [4J (Figure [3 (b)| ). 

The canonical, three-node network of Slepian and Wolf [1], as well as its extension in [2] 
do not consider practical decoding algorithms. Such algorithms where considered in [5j using 
Turbo codes, in [6j based on the Wyner scheme [7J and in [HI [9l [lO] based on Low-Density 
Parity-Check Codes (LDPC), including a solution for any point on the rate region [llj. 
In a recent study [T2[ [13], the authors use linear-programming based decoding techniques. 
An algebraic (Reed-Solomon-based) coding scheme was suggested in p^. Nevertheless, a 
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(Figure l2(by 



major disadvantage of applying decoding techniques based on structured linear codes to a 
networked environment is the need to separate network and source coding. This separation 
can fail for most demand structures [15j (that is, will require higher rates than the necessary 
cut-set bounds). Hence, while the distributed source coding model was studied extensively in 
the information theory literature throughout the years, a huge gap still exists when trying to 
apply the efficient decoding schemes to large networks with many terminals^ where network 
coding is essential to achieve maximal throughput. 

First introduced by Ahlswede et al. in [16j, network coding deals with various coding 
operations that can be performed at intermediate nodes in the network in order to achieve 
certain rate goals. For linear network coding [TTl [I8], decoding sums up to solving a set of 
linear equations. When all terminals can solve and reconstruct the original data, the appli- 
cation layer above the network code can choose the source coding scheme independently, as 
first the network code is decoded, and then the source code. This is a separation based cod- 
ing scheme. However, in general, separation schemes fail to achieve the maximal throughput 
in the network (e.g., a rank equal to the max-flow at each node [17j), and an extra rate is 
required if each terminal wishes to decode its own subset of bits. In this case we say that 
separation fails [15j. The solution is thus one of the following: use extra rate (compared 
to the max-flow bound), or use joint network-source code^ where the network and source 
codes are matched and are designed to be decoded together. To date, joint network-source 
coding schemes are prohibitively complex, while eflicient decoding schemes require a struc- 
tured source code, which is shattered by the lack of separation. A midpoint approach was 
taken in [19] using minimum-cost optimization, giving a trade-off between joint and separate 
network-source coding. In [20j, a different approach was taken, yet still sub-optimal in terms 
of the required rates. Alternatively, in [211 l22j the authors suggest a sum-product-based 
algorithm, nevertheless, in this case, efficient decoding requires a network with bounded de- 
gree nodes. Hence, our goal in this work is to design efficient coding schemes which are both 
rate-optimal and applicable to general network topology. 
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Main Contribution: We formally define the setting for joint network-source coding in 
a network with side information at multiple terminals. We design a joint coding scheme 
that, under certain constraints, both achieves the lower bounds on the link capacities and 
facilitates efficient decoding at the terminals, by inducing, at all terminals simultaneously, a 
source code which is based on a low-density parity-check matrix, and hence can be decoded 
efficiently. This joint network-source code is achieved by a sparsification procedure of the 
source doding matrix and the network transfer matrix together, at the code-design stage. 
Our analysis of the sparsification performance is based on a novel connection between matrix 
sparsification and rate-distortion theory, a connection which besides being interesting on 
its own, allows us to both analyze the best possible sparsification performance and give a 
randomized algorithm to approximate it. Numerical results also illustrate the efficiency of 
the derived codes and algorithms. Furthermore, this connection gives a novel randomized 
algorithm for coding with a fidelity criterion. 

The rest of the paper is organized as follows. Section [2] gives the required notation, 
related results and describes the model of distributed source coding with several receivers, 
including a discussion of the main difficulties. Section [3] gives our main results. Section [4] 
describes the coding algorithm. Section [5] gives the numerical results and Section [6] concludes 
the paper. 

2 Preliminaries 

Network and Source Model: A network is defined as a directed acyclic graph (V,5), 
where V is the set of vertices (nodes) and ^ C V x V is the set of edges (links). Associated 
with each edge e G ^ is a capacity c(e) > 0. For any node we denote the set of incoming 
and outgoing edges of node v by In{v) and Out{v)^ respectively. 

Let {X^l^i be a sequence of independent and identically distributed random variables 
with alphabet X. Source {Xi] is available at node 5 G V. The case of multiple sources 
can be dealt with similar techniques, by coding for points on the rate region. Let T C V, 
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|T| = be a set of terminals. A terminal t G T has (side) information available 
to it. For each t, is a sequence of independent and identically distributed random 

variables with alphabet y. We assume that the pairs (X^, Y^) may be dependent, with some 
known joint distribution p{x^ y^), yet for different time indices i and (X^, Y^) and (Xj^Yj) 
are independent (that is, a memoryless model). We assume that the node s has no incoming 
edges and that each terminal node in T has no outgoing edges. 

For any vector of rates {Re)ees^ ^ {{'^^^^)eeE)'^) code comprises the following mappings: 

: A'^ ^ {1, . . . , 2^^^}, for e G Out{s) and g"^ : Iie'^in{v){^^ • • • , 2^^^^} ^ {1, . . . , 2^^^}, for 
e G Out{v)^v s. That is, the source codes a block of n inputs and maps it to messages 
on its outgoing links. Each internal node may code its inputs and, again, map them to 
messages on the outgoing links. Thus, for each terminal t G T, ne^/^(t){l, . . . , 2^^^} is the 
set of inputs to that terminal. For simplicity, we denote this input by ft{X^). ft{X^)^ 
together with (y/, . . . , Y^) = Y^'^ is the data the terminal uses for decoding. 

Throughout this work, we use linear network coding^ that is, source symbols are mapped 
to a vector of length w over some finite field F . An edge e G Out{v)^ G V, carries an element 
of the field, which is a linear combination (over F) of the variables on all edges e' G In{v). 
Note that this "unit capacity" assumption does allow for any rational capacity by adding 
parallel edges and normalizing over several network uses. It is convenient to represent the 
input to the source as w imaginary incoming edges. Under this notation, /g denotes the 
global coding vector for edge e (its coefficients with respect to the input variables) and for 
some t G T, denotes the linear space spanned by {/g : e G /n(t)}. 

For nodes s and t, we denote by maxffow(5, t) the capacity of the maximal ffow between 
s and t. This ffow is equal to the capacity of the minimal cut separating s from t. 

Definition 1 ([23j). A t(;-dimensional linear network code on an acyclic network is linear 
broadcast if for every non-source node t, rank(V^) = min{t(;, maxffow(5, t)}. 

Corollary 1 ([23j). For any acyclic network and sufficiently large base field F, there exists 
a w- dimensional linear broadcast code. 
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In words, for any dimension there exist a large enough base field F and a linear 
network code such that for any non-source node t, we have ft{s) = sM^, where s is the w 
dimensional source vector over F and is a t(; x \In{t)\ matrix over F, whose columns are 
the global coding vectors {/g : e G In{t)}. Furthermore, rank(M^) = min{t(;, maxfiow(5, t)}. 

As the base field F should be large enough to allow for the matrices {Mt} to satisfy 
Corollary [H one cannot assume that the source alphabet, X is of the same size as F. From 
this point on, we assume that X = {0, 1} and that F = GF(2^) for sufficiently large m. In 
this case, wm consecutive input bits are mapped to a source vector s. 

Proposition 1. Let N = (V,^,5,T) be a network with a source 5 G V and terminals 
T C V \ 5. Assume (V,^) is a directed acyclic graph and that c(e) = 1 for any e ^ 8. Set 
w = maxt^T niaxffow(5, t) and a sufficiently large m. Then assuming hits hi^ . . . ^h^w 
available at the source s, there exists a linear network code over base field GF{2^) such that 
any terminal t G T receives m • maxffow(5, t) linearly independent equations on 6i, . . . , bmw 

In other words, if bits 6i, . . . , bjnw are available at the source, a terminal t with maxffow(5, t) 
I < can receive (possibly after linear operations at that terminal) a vector of ml bits, 
y = . . . , bjni) which is the result of . . . , bmw)B, for B G {0, ij^^x^^ with rank ml. 
The proof of Proposition [T] is rather technical and appears in Appendix [A} 

Linear Codes for Structured Binning: Consider the source coding problem in Figure 



2(a)[ By [Ij, the set of achievable rates is given by Ri > H(X|y), R2 > H(y|X) and 
Ri + R2 > H(X, y). In the particular case where the side information Y is available 
at the decoder (Figure 2(b)D , a rate H(X|y) is required. Now, assume X is a binary 



symmetric random variable, related to Y through a binary symmetric channel. That is, 
y = X where is a binary random variable with Pr(£' = 1) = p and © denotes XOR 
operation. Note that in this model H (X|y) = h{p)^ where h is the binary entropy function 
h{p) = —plogp — {1 — p) log(l — p). In this case, a coding scheme based on linear codes was 
suggested by Wyner \J\ (see [24j also). 
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Fact 1. Let e > be arbitrary. For n sufficiently large, there exists an nxm matrix H with 
m < {h{p) + e)n and a function f : {0, 1}"^ {0, l}"" such that Pr(/(e''i7) ^ e"") < e. 

The source vector is thus encoded as x^H^ while, at the decoder, e^H is calculated 
according to e^H = x^H y^H (as y^ is available) and x^ is then reconstructed using 
x^ = y^ ® f{He^). Hence, the decoding process is analogous to the decoding of a linear code 
with a parity check matrix H. For example, in [8j, LDPCs are used. As a result, H is also 
required to be sparse and of a certain structure. 

Distributed Source Coding with Multiple Terminals: We now turn to our original 
problem. In terms of the achievable rates alone (regardless of the possibility for efficient 
decoding) the problem can be viewed as multicast in the presence of side information. The 
following can be seen as a corollary of [4J . 

Corollary 2. LetN = (V, f , 5, T) be a network with a source 5 G V and terminals T C V\s. 
Assume (V, £) is a directed acyclic graph. Let source X be available at s and for each 
t, side information be available at t ^ T. Then, a necessary and sufficient condition 
for X to be reconstructed at the terminals with an arbitrarily small probability of error is 
maxflow(5,t) > H{X\Y') for all t. 

A possible scheme for achieving the bound in Corollary [2] is random binning at the 
source with linear network coding [4J. In this case, it is easy to construct a source coding 
scheme which is oblivious to the network code, in the sense that a terminal t does not care 
whether it receives the k most significant bits of a bin index, or any k linearly independent 
equations on the n > k bits of the bin index. Another possible scheme uses linear coding 
and minimum entropy decoding [2j. Note, however, that the decoding complexity of both 
schemes is exponential in the block length. Should, however, one try to use linear codes with 
a structure that facilitates efficient decoding, for more than two terminals the structure of 
the code will be destroyed by the network code. 
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To see this, assume bits are available at the source. Denote by Wt the max-flow to 
terminal t and by w the maximal max-flow among all terminals, max^it;^. By Corollary [2| 
there exists a (joint) network-source coding scheme for which each terminal t can decode 

with an arbitrarily small (as n oc) error probability as long as Wt > ll{X\Y^) + e. 
However, due to the lack of network-source coding separation [15j, even if source node 
s creates K nested codewords, were the codeword intended for t is of length n'R{X\Y^) 
(assuming n'R{X\Y^) is an integer) and is a subset of the codewords for terminals with 
higher rates, in general, there does not exist a network code to ensure that each terminal t 
indeed receives the subset of bits intended to it while satisfying the min-cut bounds (this can 
be done, in general, for at most two terminals ^25]). By Proposition [Tj however, there exists 
a network code for which terminal f does receive nH{X\Y^') linearly independent equations 
on the nmaxtH(X|y^) transmitted bits. In other words, source node s can generate a 
single codeword of length nmax^ H(X|y^), using a good parity check matrix H of size n x 
nmax^ H(X|y^). Denote this codeword by c = x^H. c is hence of length nmax^ H(X|y^). 
Each m bits in c are mapped to a symbol in GF(2^), and a vector of w symbols, s, is 
transmitted through the network. At terminal t, the received vector is y = sM^. We assume 
wm divides nmax^ H(X|y^), so c might be transmitted using several network uses. By 
Proposition [l| the nH(X|y^) bits received at t can be represented as (61, . . . ,6nH(x|y*)) = 
x^HBf^ where Bf is an nmax^/ H{X\Y^') x nR{X\Y^) matrix with full rank (if \In{t)\ > Wt^ 
one can take Wt linearly independent coding vectors and discard the rest). 

While i7 is a parity-check matrix designed to facilitate eflicient decoding of from e^i7, 
Bf is a matrix defined by the network code^ and may be different for each terminal t. As 
a result, even if i7, for example, is sparse, HBt might not be sparse at all and lack any 
structure that allows eflicient decoding. This is the reason we say that the structure of the 
source code is shattered by the network code. 
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3 Efficient Decoding Using LDPC 

The focus is thus on designing H (the source code), a hnear network code with matrices 
I < t < and possibly post-processing matrices P^, such that, for each terminal t, 
Ht = HBfPt is a good parity-check matrix for terminal t, with a structure that facilitates 
efficient decoding. In this section, we center our attention on H and the P^'s, and derive 
sufficient conditions to guarantee that each terminal t sees, with high probability, a low- 
density parity-check matrix Hf^ assuming Bt is the result of either deterministic or random 
linear network coding. 

At the heart of the suggested scheme is a matrix sparsification algorithm and its analysis 
for high rates (almost square matrices). In short, for each HBt^ we wish to find a matrix 
Pt such that HBfPt is sparse. In [26l [27], the authors consider randomized matrix sparsi- 
fication algorithms. If A is an n x (n — k) matrix, the authors seek an {n — k) x {n — k) 
matrix P such that AP is sparse. The analysis therein, however, aims at bounding the dif- 
ference between the sparsity achieved by the algorithm and the best possible performance, 
without quantifying the best possible performance directly. In the next subsection, we show 
a novel connection between matrix sparsification and rate distortion theory. Through this 
connection, we are able to give bounds on the best possible performance directly, and, as 
a result, give sufficient conditions under which one can indeed find such P which yields a 
sparse matrix AP. At the basis of our results are Lemmas [T] and [2} which give lower and 
upper bounds on the sparsification performance. Theorem [T] below, our main result in this 
section, utilizes these lemmas to show that indeed, as long as the difference in the strength of 
the side information (that is, conditional entropies) available at the nodes is small, designing 
joint network-source codes which induce low-density parity-check matrices at the terminals 
is possible. 

Theorem 1. Let (V, ^, 5, T) he a network with binary uniform source X available at node s 
and terminals t G T with side information . Assume maxffow(5,t) > Rt = H{X\Y^) + e. 
Then^ for large enough block length it is possible to construct a joint source and network 
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code such that each terminal can decode X using a syndrome resulting from a parity check 
matrix Ht with normalized density at most i^(maxt/ Rt^ — Rt) + 0{^^f^), where D{-) is the 
distortion rate function of a binary uniform random variable under Hamming distortion. 

This result gives a vanishing density whenever the strength of the side information at the 
nodes is approximately the same. In particular, at the limit of high rates, a vanishing density 
can be achieved. A small (but still useful for efficient decoding) density can be achieved when 
the variation in the rates is slightly larger. Moreover, note that a naive matrix sparsification 
based on Gauss elimination would result only in a linear rate-sparsity trade-off, as the lower 
part of the matrix (of size n{l — R) x nR) is arbitrary. For i? ^ 1, the improvement over 
Gauss elimination is arbitrarily large. To see this, note that the derivative of D{R) tends to 
as i? ^ 1, compared to the 1/2 rate achieved by Gauss elimination. For example, taking 
i? = 1 — 0{^^^^^) results in a density of 0(^^). All this can be clearly seen from Figure 4, 
which compares D[R) to Gauss elimination, together with the results of the sparsification 
algorithm we used. 

Theorem [T] results from analyzing both the properties of the joint network-source code 
HBt seen by terminal t, and the best possible sparsification performance. The first part 
is rather technical and is summarized in Proposition [2] at the end of this section. We now 
focus on the second part and show that, at least in the randomized setting, performance 
guarantees for matrix sparsification can indeed be achieved via the distortion-rate function. 



3.1 Analyzing Matrix Sparsification Via Rate-Distortion 

Matrix sparsification is formally defined as follows. 

Problem 1. Given a full rank matrix A G GY{qY^^^~^\ find and invertible matrix P G 
GF(g)(^~^^^(^~^^ which minimizes the number of non-zeros (nnz) in AP. 

In [261 Ell authors prove that matrix sparsification is NP-hard, and give approxima- 
tion algorithms. Note that, in our model, matrix sparsification is part of the code design^ and 
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not the decoding process. Hence, with successful sparsification, efficient decoding is possible 
throughout the transmission of the (possible many) source blocks. 

The analysis in [26l ETj bounds the performance compared to the optimum, but does 
not include any guarantee on the achieved sparsity. In the context of our problem, such 
a guarantee is essential in order to verify that each terminal actually receives a syndrome 
which can be seen as created by a low-density matrix. 



With a slight abuse of notation, for any a, b G GF{q)^ define rf(a, b) = ^^^i b(i)). 
It is common to denote rf(a, b) as nnz(a — b). At the heart of the matrix sparsification 
algorithm stands the following problem. 

Problem 2. Min-Unsatisfy: For any A G GF(g)^^(^~^^ and b G GF(g)^, find the vector 
X G GF(g)^-^ which minimizes rf(Ax, b). 

Let A_j denote the matrix A with the j-th column, a^, removed. Under these definitions, 
a possible matrix sparsification algorithm has the following form [26] . 

Algorithm 1. For each I < j < m: 

1. Let x = Min-Unsatisfy(A_j, aj). 

2. Replace the column Rj with a^ — A_ji^. 

The following lemma bounds the performance of any algorithm for solving Problem [T] 

Lemma 1. Let A G GF{qY^^'^~^^ he a random matrix with i.i.d. entries. Let P{A) G 
GF{qY^~^'>^^^~^'> be any invertible matrix^ whose entries may depend on those of A. The 
expected number of non-zeros in AP satisfies ^^_^^^^^ E {nnz{AP)} > D{{n — k)/n), where 
D{-) is the distortion-rate function o/A(l,l) under Hamming distortion. 

Proof. The proof is based on the converse to the rate-distortion theorem (e.g., [28^ Section 
10.4]). In particular, we show that for any possible choice of P (which, in general, depends 
on the realization of yl), each column of AP is at most as sparse (in expectation) as the 



First, a few definitions are required. For any a, 6 G GF(g) define rf(a, h) 
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vector of differences between a column of A and the closest vector to it among the best 
rate-distortion code of size q^~^~^. 

Let P be a given invertible matrix over GF{q). For each column p^- in P, denote by 
the index of the first non-zero entry in p^- and let Xj = j). Let A denote 

the diagonal matrix whose diagonal entries are {Aj}^~^. Finally, denote the minimizer in 
Problem [2] by XA(b), that is, XA(b) = argmin^ (i(74x, b). We have: 



^ -E{nnz(AP)} = E{nnz(APA)} 



(n — k)n {n — k)n 

^ C n—k \ ^ n—k | n—k 



^ n—k 

^ in- k^rJ ^^i'^i^'^^'^' -^-,XA_,(a.*(,))) } . (1) 



Now, simply consider a^*(j) as a source vector of length n, and the set {^-jX : x G 
GY{qY~^~^} as a rate-distortion code of rate {n — k — l)/n (that is, having q^-^-^ codewords 
if A_j has full rank). The structure of the code is, of course, defined by the matrix A_j. By 
the converse to the rate-distortion theorem, for any subset {bi, . . . ,b^ni?} C GF(g)^, where 
< i? < 1, we have 

EKa,*(,),b,0}>ni^(i?), (2) 

where i is the minimizer of rf(a^*(j),b^) over aU 1 < z < g^^. A-fortiori, Q holds if the set 
{bi, . . . ,b^nH} is not chosen optimally (herein, it is chosen as the linear subspace spanned 
by the columns of A_j). As a result, 

> D 



n — k 



n 



where the last inequality follows from the monotonicity of D{R). □ 
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Note that the above proof is based on the fact that a solution to Problem [T] can be seen 
as sequentially applying Problem [2] to each of the columns of A, with the rest of the columns 
as the second argument. This is also the reason that an approximation for Problem [2] results 
in an approximation for Problem [T} and bounds on the possible performance in Problem [2] 
give bounds on the performance in Problem [T] 

In fact, in certain cases, the bound in Lemma[l]is achievable. For binary random matrices 
with uniform i.i.d. entries, we have the following achievability result. 

Lemma 2. Let A G GF{2Y^^^~^^ he a random matrix with uniform i.i.d. entries. For any 
Q < D < 1/2 which satisfies > 1 — h{D) + + ^, there exists an invertible matrix 
P G GF(2)(^-^)><(^-^) such that with probability at least 1 - (n - fc)2-^, for alll < j <n-k, 
we have nnz{Apj) < nD. In particular, ^^^_^^ imz{AP) < D. 

Proof. At the basis of the proof stands the fact that for binary uniform random variables, 
linear codes achieve the rate-distortion function [29j . 

First, we show how to construct a square matrix P which achieves the required sparsity 
of AP. For any matrix C G GF(2)^><^^, < i? < 1, define C = {Cx : x G GF(2)^^}. In 
[29] . it is shown that if C is random with uniform i.i.d. entries, and b is a binary vector with 
uniform i.i.d. entries, then if i? > 1 — h{D) + 2^^ we have 

Pr (^mmrf(c,b) > nD^ < 2"^. (3) 

Note that the codeword c which minimizes miucec d{c^ b) is simply a linear combination of 
the columns of C, that is, (7xc(b). Hence, to construct the invertible matrix P, we proceed 
inductively as follows. We consider ai as the source vector (b in ([3])) and A^i as the linear 
code. Note that both are random with uniform i.i.d. entries. Define a matrix Si such that 
all its columns except the first one are equal to those of the identity matrix. For the first 
column, set 5'i(l, 1) = 1 and the rest of the values as simply x^_-^(ai). Si is invertible. To 
continue, construct Sj as follows. Set all its columns except the j-th equal to those of the 
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identity matrix. For the j-th column, set Sj{j^j) — 1 and the rest of the values as simply 
i^(ASi-Sj-i)-j{{ASi • • •5'^--i)j)- 

We set the matrix P as simply Si - - - Sn-k- Clearly, P is invertible. Note that the j-th 
column of P is simply the j-th column of • • • 5'^, that is p^- = (S'l • • • Sj)j. Moreover, this 
column is chosen to minimize the sum of {ASi • • • Sj-i)j and the linear combination of the 
columns of {ASi - - - Sj-i)-j. 

We now show that for each j, ASi • • • Sj-i remains random with uniform i.i.d. entries. 
Each matrix Si can be represented as a multiplication of 5^ — 1 matrices, Si being the 
number of non-zero entries in the zth column of Si. Denote them by W/^. , . . . , Vl/^*"''^. 
Each of these matrices differs from the identity matrix by one entry only: W^. has 1 at 
the entry corresponding to the Ith non-zero element of the ith column of 5^^ (excluding 

/iio\ /iio\ /ioo\ 

the diagonal element). For example, joioj = joioj-[oioj. Thus, 






5i . . . Sj.i = W^i,, • • • , W'sl~^ ' ' ' W^._^, . . . , Ws^Zl \ Consider the multiplication AW^^. 
This operation simply replaces one column in A with its XOR with another column. Since 
for two independent uniform bits X and F, X and X ®Y are also independent uniform bits, 
AW^^ remains a matrix with binary uniform i.i.d. entries. By induction, ASi • • • Sj-i is also 
uniform with i.i.d. entries. 

Utilizing the above, to compute the probability that some column of AP has a density 
higher than nD, we have 



Ui<i<n-fc nnz(Ap^) > nD^ < Pr (^nnz(Ap^) > nOj 

l<j<n—k 

= J2 Pr{nnz{A{S,---Sj)j)>nD) 

l<j<n—k 

= J2 Pr (nnz((^5i • • • Sj.^).j ■ X(^s,...s,_0-. ((^^i " " " Sj-i)j)) > nO) 

l<j<n—k 

= {n- A;)Pr(^nnz(A_iXA_i(ai)) > nD^ < {n - A;)2-", (4) 

where the last inequahty apphes if ^^=^ > 1 - h{D) + 2^^. □ 
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Remark 1. Note that if each column in P is chosen separately of the others, in a way that 
the n — k — 1 entries of minimize the distance between and some hnear combination of 
the columns in A_j^ then P is not necessarily invertible. However, it is not hard to see that 
all its off-diagonal entries are uniform i.i.d., while all the diagonal entries are deterministic 
and equal to 1. To compute the probability that P is invertible, note that the first row of P 
has 2^~^~^ possibilities. The i-th row of P cannot be a linear combination of the first i — 1 
rows (there are 2^"-^ such combinations), yet, since P{i^i) = 1, half of the combinations are 
not counted. As a result, the probability that such a P is invertible is 

2{n-kr-{n-k) - -^W U - ^ ; u.^y. 

3.2 Proof of Theorem [I 

At the basis of the proof is the following theme: we identify the properties of the transfer 
matrix HBt seen by a terminal t. As Bt is defined by the network code, hence constrained 
by the network topology and, in general, cannot be optimized to our needs, we seek a proper 
H such that, with high probability, either HBt is sparse or it can be sparsified. Since the 
constraints under which HBt i^i^y be sparse are not necessarily met (depending on 5^), we 
take the freedom of choosing H such that at least the result in Lemma [2] can be used, and 
HBt can be sparsified. 

The proof is based on the following proposition, which gives a qualitative description of 
the properties of the transfer matrices HBt seen by each terminal t. The results herein are 
stronger than those required to use Lemma [2} but we include them for completeness. A 
proof is given in Appendix [Bj 

Proposition 2. Assume H is an n x (n — k) random binary matrix with i.i.d. components^ 
where n — k = B(n) and Pr(i/(1, 1) = 1) = X/n for some A > 1. Then, 

1. If Bt is full rank then each column of HBt is composed of independent hits (that is, 
column-wise independence). 

15 



2. If Bt is full rank and H is chosen at random with uniform entries (X = n/2), then 
HBf has uniform i.i.d. entries. 

3. If each column in Bt contains a linear (in n) number of non-zero entries, e.g., Cjn then 
lim^^oo Pr((i^5t)(i, j) = 0) = | (l + e"^''^^), where {IIBt){iJ) denotes the {ij)th en- 
try of HBf. If in addition, A = w{l), we have 

lim,,^^FT{{HBt){i,j) = 0) = l 

4' If each column in Bt contains a sub-linear number of non-zero entries then 
lim^^^FT{{HBt){i,j) = l) = 0. 

5. Let bj and by be two columns of Bt. If d{bj^ bjf) = Q{n), then by setting X = uj{n) the 
random variables {IIBt){i^j) and {IIBt){i^ j') are independent for each i. Hence, HBt 
has independent columns. If, however, d{bj^ bji) = o{n), then by choosing X as some 
positive constant, we have Ymin^^Vi {{H Bt){i^ j) ^ {IIBt){i^ j)) 0. 

Proof of Theorem\J\ First, by Proposition [2} for each terminal t, the transfer matrix HBt 
can either be made sparse using a proper choice of i/, or it can be made completely i.i.d. 
with binary uniform entries. This is since A(n) is a parameter of the source code, and can 
be chosen by the code designer. 

Then, at the "worst case" of HBt being a uniform i.i.d. matrix, we use Algorithm[T} which 
results, at the limit of infinitely many repetitions, in a sparsity ratio of 2D{R) (Lemma [2] 
and [26j). Note that by [271 Theorem 8], there are unsparsifiable matrices, that is, matrices 
which cannot be made sparser that a linear rate-sparsity curve. In a sense, our results show 
that under a proper choice of i7, we can guarantee that the transfer matrices seen by the 
terminals can be sparsified below the linear curve - up to the distortion-rate curve. 

Remember that for g = 2 and b(0) being Bernoulli (p), we have 



R{D) = 



h{p) - h{D) if < D < mm{j9, 1 - p} 
if D > min{p, 1 — p} 



and D(R) is the inverse of R(D). Thus, for example, from analyzing D(R) for i? ^ 1, we 
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have that for i? = 1 - D{R) = 



□ 



4 A Randomized Algorithm for Linear Rate-Distortion 

Note that through the relation between matrix sparsification and rate-distortion, we are now 
able to give a randomized algorithm to approximate the closest codeword in a rate-distortion 
problem (using linear codes) and assess its approximation factor compared to the optimum. 
Denote by C{I^:) the matrix consisting of rows / of the matrix (7, for some index set /. 
Consider the following algorithm, whose performance guarantee is a direct application of 
Lemma [2] and f26] . 

Algorithm 2. Input: A linear code C of rate R (defined by an n x nR matrix C). A vector 
b. Output: A code word c. 

1. Randomly choose an index set / of nR independent rows in C, 

2. Solve for x in C(/, :)x = b(/), 

3. Return (7x. 

Corollary 3. Assume C and b are i.i.d. binary symmetric. With probability at least e~^, 
Algorithm^ returns a codeword c whose distance from b is at most (^ + 2) • nD{R). 

5 Numerical Results 

Figure [4] gives a numerical example of how close one can get to the rate-distortion curve 
using Algorithm [2j This algorithm is at the basis of the sparsification process and, in fact, 
the density of the matrices we use for Figure [5] is that depicted by the approximation curves 
in Figure [4j The sparsification algorithm used was Algorithm [T} with a random guess to 
solve the Min-Unsatisfy problem (similar to Algorithm [2]) . 

To depict the performance of LDPC codes created from matrix sparsification of a random 
i.i.d. matrix with uniform bits. Figure [5] gives the bit error rates (BER) for decoding of such 
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codes. Figure [5] shows also a bit error rate for two matrices created by the foUowing structured 
approach: an identity matrix is rephcated 4x5 times, foUowed by about 10^ permutations, 
where every permutation preserves the constant number of ones in each column (4 ones) and 
in each row (5 ones). While the results of LDPC codes created from matrix sparsification 
are inferior to well-optimized LDPCs, note that the codes suggested in this work do not 
require network-source coding separation, and hence, unlike optimized codes, can be used in 
networks with multiple receivers and without the (possibly very large) excess capacity above 
the min-cuts required when separation-based scheme is used. Sparse matrices can also 
be achieved for deterministic network coding matrices, resulting from the polynomial time 
algorithm of [30j. For example, sparsification results for a graph of 70 nodes, with coefficients 
over GF(2^), can be found in Figure [oj Note that the given percentage of non-zero elements 
is over GF(2^), and not over GF(2). Figure [t] includes additional simulation results, for 
decoding of random LDPCs achieved by matrix sparsification algorithm - Algorithm [T] We 
can clearly see that better sparsification yields better bit error rate. Since for the same 
number of repetitions. Algorithm [l] gives better sparsification for smaller matrices (for larger 
matrix, more repetitions are needed to achieve better solution for the min-unsatisfy problem), 
the best BER is achieved by the smallest (128 x 115) matrix. 

Figure [s] shows the density (of ones) achieved by the Algorithm [!} compared to Gauss 
elimination and the rate-distortion function D[R)^ which is the lower bound (as we have 
shown in Lemma [T]) for any matrix sparsification algorithm. We can see that a single repe- 
tition of min-unsatisfy in the Algorithm [T] results in almost the same sparsification achieved 
by the Gauss elimination. As the number of repetition grows, the density becomes much 
closer to the lower bound D{R). The matrices used in the simulation are: 300 x 240, and 
300 X 270 which correspond to the rates 0.8 and 0.9 respectively. 

On the more practical side, we note that further sparsification can be sought using meth- 
ods to sparsify sparse matrices [31j. Moreover, the benefit in sparse network coding matrices 
goes beyond the problem of coding with side information. For example, more efficient algo- 
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rithms for solving linear equations (in the simple multicast scenario) can be used [32[ |33] . 



6 Conclusion 

In this work, we formally defined joint network-source coding for networks where multiple 
receivers have side information. We described a code-design procedure which is based on 
matrix sparsification, enabling each terminal in the network, at the limit of high rates, to 
receive codewords corresponding to low-density parity-check codes, thus facilitating efficient 
decoding. Since our scheme is not based on network-source coding separation, optimal rates, 
matching the cut-set bounds, can be achieved. Simulations performed depict encouraging 
results, also at non-limiting rates. 
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(a) The Slepian-Wolf network ^Lj. (b) The asymmetric Slepian-Wolf network. 



Figure 2: (a) Upper encoder has the source X, lower one has the side information y, which 
is, in general, correlated with X. We are interested in the set of rates (i?i,i?2) such that 
both X and Y can be reconstructed at the decoder, (b) The encoder describes the source X 
to a decoder which has side information Y available. We are interested in the rate Ri such 
that X can be reconstructed at the decoder. 
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(a) A network with correlated sources [2.. (b) A network with correlated sources and side infor- 

mation |4|. 



Figure 3: (a) Nodes s and z have sources X and Y available. In general, X and Y are 
correlated. The sources are demanded (loselessly) at the terminals ti and t2- (b) In this 
case, terminals ti and t2 have additional side information available to them, Zi and Z2, 
respectively. 
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Figure 4: Approaching the distortion-rate curve using Algorithm [2j 
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BER curves for random and structured LDPC, Rate=0.2 
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Figure 5: Bit error rates for LDPCs created by applying randomized sparsification algorithm 
on binary symmetric i.i.d. matrices of size n x ni?, R = 0.8 (a rate 0.2 channel code), and 
for LDPCs created by a structured construction. 
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Figure 6: Sparsification results for deterministic coding matrices over GF(2^). 
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BER curves for random LDPC 
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Figure 7: Bit error rates for LDPCs created by applying Algorithm [T] on binary symmetric 
i.i.d. matrices of size n x ni?, R = 0.9 (a rate 0.1 channel code). 
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Figure 8: Approaching the minimum possible density using Algorithm [T] 
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A Proof of Proposition [T] 

Proposition [T] merely translates Corollary [T} which asserts that linear network codes such 
that each node t can receive maxflow(5, t) independent equations on the w source symbols 
over some base field GF(2^), to a binary representation. Hence, each node t can receive 
m • maxflow(5, t) independent equations on the mw bits. The proof is based on the following 
technical claim, which shows that indeed linear equations on the elements of the source 
vector over GF(2^) translate to linear equations on the bits in the bit representation of the 
symbols. Proposition [T] will then easily follow from Corollary [T] 

Claim 1. For any set of linear equations over F = GF{2^), sM = where s G F^ ^ 
M G F^^^ with rank r and y ^ F\ there exists a set of linear equations over GF{2), and 
a binary matrix B G GF{2)^^^^^ of rank rm, such the binary representations of s and y, 
denoted bg and by (that is, the coefficients over GF{2) in the representation of the elements 
of F as polynomials), satisfies bgB = by over GF{2). 

Proof. The system sM = y represents r independent equations over F. Each equation is of 
the form 

SiTTii^i + S2m2,i + . . . + s^rriy^^i = yi (5) 

(with a slight abuse of notation, m represents the extension degree and m^j represents the 
i^j entry of M). Each unknown Sj can be represented as a polynomial in x, 

where now {a^ }[^o^ are m unknowns over GF(2). The same holds for all entries of the matrix 
M, yet, since M is known, so do the coefficients of each m^j in its polynomial representation. 
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Hence, (|5]) translates to 

(a? + a}x + . . . + • (m° , + mj^^x + . . . + m^-^x'"-^) + . . . 

= (y° + %ia; + ... + yr'^"^"')- (6) 

The l.h.s. of ([g]) has a maximal degree 2m — 2, hence, the reminder modulo an irreducible 
polynomial of degree m — 1 is computed. This operation is linear in the coefficients of each 
x\ and, as a result, the l.h.s. translates to a polynomial of degree at most m — 1 with 
coefficients which are linear functions of the unknowns {a^j}J^J^Q. When comparing the 
coefficients of each power of x, the result is m linear equations for the unknowns {a^}J^77io- 
Clearly, r independent linear equations of the form of ^ (over GF(2^)) will result in mr 
linear equations for {a^j}J^J^Q (over GF(2)). To see that the resulting mr equations are 
indeed independent, note that if w — r oi the unknowns 5i, . . . , 5^ are given, this method 
must determine the values of the rest. Since the remaining r unknowns over GF(2^) are 
represented by mr unknowns over GF(2), the mr equations over GF(2) must be independent. 

□ 



B Proof of Proposition |2 

Proof sketch. 1. li Bt is full rank, non of its columns is the all-zeros column. Assume 
column j of Bt contains / ones. Then, for each the entry (i, j) of HBt is the sum of 
/ independent Bernoulli(A/n) random variables, and the summands for the entry (i, j) 
are independent of those for (i^ i), for any i ^ i' . 

2. If Bi is full rank, non of its columns are identical. This means that the sums composing 
{HBt){i^j) and {HBt){i^j') differ in at least one random variable. Since this random 
variable is binary, uniform and independent of all the rest, {HBt){i^j) and {HBt){i^ j') 
are also uniform and independent. For different output rows this holds trivially. 

3. The event that {HBt){i^j) = is the event that an even number of the Bernoulli 
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random variables is 1, hence 

PTiiHB,)it,j) = 0) = ^ (^1 + (l - (7) 

for all i. Assume I = Cju and compute the limit as n ^ oc. For large enough A, the 
distribution is arbitrarily close to uniform bits. 

4. We use the result of Item |3] above. Since the number of non-zero entries in column j of 
Bt is sub-linear in n, for large enough n it is smaller than any cn (for arbitrary small 
c> 0). Thus, 

Taking the limit of n ^ oc we have 

l{l + e-'^')<Fv{{HB,){t,j) = 0)<l (9) 

for arbitrarily small c, which completes the proof. 

5. If rf(bj,bj/) = B(n), then the difference between [HBt){i^j) and {HBt){i^j') is a sum 
of a linear number of independent Bernoulli(A/n) random variables. This sum is a 
uniform bit if A = uj{n)^ hence the random variables {HBt){i^j) and [HBt){i^f) are 
independent for each i. However, if this sum is sub-linear, then a constant A results 
in an asymptotically zero probability for it to be 1 (similar to item [i] above). This 
means that the probability that {HBt){i^j) and {HBt){i^ f) differ is small. Note that 
in this case, with high probability, the sum of {HBt){i^j) and {HBt){i^ j') is 0, hence 
summing the two columns gives a sparse column. 

□ 
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